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Abstract 

It is commonly acknowledged that temporal expression extractors are important components of larger natural language processing 
systems like information retrieval and question answering systems. Extraction and normalization of temporal expressions in Turkish has 
not been given attention so far except the extraction of some date and time expressions within the course of named entity recognition. 
As TimeML is the current standard of temporal expression and event annotation in natural language texts, in this paper, we present an 
analysis of temporal expressions in Turkish based on the related TimeML classification (i.e., date, time, duration, and set expressions). 
We have created a lexicon for Turkish temporal expressions and devised considerably wide-coverage patterns using the lexical classes 
as the building blocks. We believe that the proposed patterns, together with convenient normalization rules, can be readily used by 
prospective temporal expression extraction tools for Turkish. 
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1. Introduction 


Temporal expressions in natural language texts stand as one of the crucial pieces of information to be extracted from 
these texts. Accordingly, several text analysis applications, like event extractors ( [Ritter et al., 2012[ ) and text-based 
video annotation systems ( Ktigrik and Yazici, 201 l| l, include a temporal expression extractor as a submodule to identify, 
normalize and then make use of these expressions. 


Traditionally, temporal expressions like some date and time expressions have been considered as named entities and have 
been included in the scope of named entity recognition (NER) systems. Eor instance, in the Message Understanding 
Conference (MUC) series ( jGrishman and Sundheim, 1996| ), which have been conducted for several years to promote 
research in information extraction, some date and time expressions were considered within the scope of the NER task and, 
in the related guidelines, these expressions are recommended to be annotated with the TIMEX tag. But within the scope of 
MUC, only the identification of these temporal expressions was required without the need for their normalization. 


TimeML is a standard markup language for annotating temporal expressions and events ( Pustejovsky et al., 2003a| ) which 
is built upon previous work on the annotation of temporal expressions such as ( jEerro et al., 200 ![ ' Setzer, 2001| l. According 
to the curi'ent TimeML guideline ( jSaurf et al., 2005| ), TIMEX3 tag is used to annotate the temporal expressions identified 
and the normalized forms of the expressions are also specified within the annotations. Additionally, SIGNAL tag is 
used to annotate the temporal relations between two temporal expressions, two events, or a temporal expression and an 
event. There are mainly four distinct temporal expressions within the scope of TimeML; date, time, set, and duration 
( jSaurf et al., 2005) 1. Hence, the extent of the temporal expressions considered in TimeML is also broader compared to the 
extent considered in the MUC series, in addition to the normalization procedure introduced. 


There are several temporal expression extraction and normalization systems, as reported in studies like 
( jUzZaman et al., 2013] l. One of the initial such systems is called GUTime which is the temporal expression recog¬ 
nition and normalization module of a larger system called TARSQI which annotates temporal expressions, relations, 
and events in news texts ( Verhagen et al., 2005| l. Several of the system proposals so far, including Edinburgh-LTG 
( [Grover et al., 2010) 1, HeidelTime (Strotg en and Gertz, 2010| l, SUTime (|Chang and Manning, 2013|l, and ESS-TimEx 
( [Zavarella and Taney, 2013[ ) are rule-based systems and some of them, such as HeidelTime, have been extended to extract 
temporal expressions in other languages including Arabic, Italian, Spanish, and Vietnamese ( Strotgen et ak, 20T^ . As 
previously pointed out, the extraction of some temporal expressions has long been considered a subtask of named entity 
recognition, and accordingly, some of the aforementioned systems like Edinburgh-LTG and SUTime are based on previous 
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NER systems. 


In addition to the system proposals, as there is a need for corpora annotated with temporal expressions, relations, and 
events, resources like TimeBank (Pustejovsky et ah, 2003b I have emerged. TimeBank has been commonly used to 


evaluate and compare different system proposals. Similar annotated resources have also been constructed for other lan¬ 
guages, such as French TimeBank dBittar et ah, 201 l| l, Spanish TimeBank dSaurf and Badia, 2012| ), and Italian Timebank 
dCaselli et al., 20lT| l. Such resources are indispensable for training the extraction systems proposed in addition to the 
common evaluation and thereby comparison of different system proposals, and to the best of our knowledge, no such 
annotated resource exists for Turkish. 


Considering the related tools on Turkish, extraction of date and time expressions has been performed by the rule-based NER 
system (Kiigiik and Yazici, 20091 and its extended versions like dKiigiik and Yazici, 2012| l, mostly following the named 
entity dehnition of the MUC series and extracting some deictic date expressions as well, without normalization. These 
experiments have been performed on diverse text genres such as news articles, historical texts, and child stories. Within a 
text-based semantic video annotation system, which makes use of this NER system, a separate date normalization module 
has been implemented to normalize only the deictic date expressions using the creation dates of the corresponding videos 
as reference dates ( Kiigiik and Yazici, 201 l| l. Within the course of this latter study, extraction experiments are performed on 
automatically obtained news video texts which are mostly noisy (due to the character recognition errors introduced during 
the sliding text recognition procedure employed). Recently, date and time expressions are also recognized in informal 


texts (i.e., tweets) in Turkish using the aforementioned rule-based system, as described in (Kiigiik and Steinberger, 2014 1 . 


Another related work is presented in ( |§eker and Drri, 2010) 1, where the authors have considered temporal logic and event 
times in Turkish based on existing temporal models, yet, it does not aim to propose a temporal expression extractor or 
related resource for Turkish. 


In this paper, we provide an analysis of the temporal expressions in Turkish, following the corresponding TimeML clas- 
sihcation. We mainly provide several wide-coverage patterns for the extraction of these expressions together with sample 
expressions and their annotated forms with the TIMEX3 tag. With the presented lexicon, pattern bases, and the review of 
the related limited literature on Turkish, we believe that this paper can be used as a guideline before building a temporal 
expression extraction and normalization system for Turkish. The rest of the paper is organized as follows: In Section 2, a 
compact temporal lexicon in Turkish and patterns for the extraction of temporal expressions in Turkish are presented to¬ 
gether with several samples. Section 3 lists the open issues on temporal expression extraction and normalization in Turkish 
texts and Section 4 concludes the paper. 

2. Temporal Expressions in Turkish 

Before presenting the lexical resources and patterns for temporal expressions in Turkish, we briefly summarize their two 
particularities in formal Turkish texts which should also be considered during system development. These writing rules are 
provided below, following the corresponding language rules published by Tiirk DU Kurumu {'Turkish Language Associa¬ 
tion’ ) dTDK, 2015) : 

• The tokens within temporal expressions are all in lowercase, except the names of the months and week days which 
have their initial letters capitalized. Sample expressions are bugiin {‘today’), yanndan sonraki gun {‘the day after 
tomorrow’), Pazartesi sabahi {‘Monday morning’), Mayis ayinin ikinci Pazar giinii {‘the second Sunday of (the month 
of) May’). 

• The suffixes attached at the ends of the tokens of the temporal expressions are not separated from the attached suffixes. 
The names of the months and week days and numerals constitute the exceptions of this characteristic, as the sequence 
of suffixes added to the ends of these are separated from them with apostrophes. In the illustrative temporal expression, 
2015 yilinin Mart’mm 23’u {‘the 23rd of March of (the year of) 2015’), the sequence of suffixes attached at the end 
of yil {‘year’) is not separated from it while the ones attached at the ends of the numeral (25) and Mart {‘March’) are 
separated with apostrophes. 

As mentioned in Section 1, there are four distinct types of temporal expressions within the scope of TimeML: date, time, 
set, and duration, which correspond to the value range of the type attribute of the TIMEX3 tag. In this section, we 
first provide a compact temporal lexicon for Turkish and in the following subsections, we present patterns for temporal 
expressions in Turkish and then samples conforming to these patterns. 

We should note that both the lexicon and the pattern bases are nowhere near exhaustive. We have tried to devise patterns 
with high coverage as much as we can, yet, they are all open to modifications, corrections, and extensions especially when 
building practical systems for Turkish. Normalization is also not considered within the current study, a distinct set of 
normalization rules should be devised for the extracted temporal expressions as part of the future work. 
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2.1. Turkish Lexicon for Temporal Expressions 

We have built the Turkish lexicon for temporal expressions with the following lexical classes. The class identifiers are given 
in parentheses and they are used in the ultimate extraction patterns as the building blocks. 

1. The list of cardinal numerals from 1 to 2100, both in numbers and in words (<NUM>), and the list of the corresponding 
ordinal numbers (<ORD>). 

2. The names of days (<DAY>), that of months (<MON>), that of seasons (<SEAS>). 

3. The names of the parts of a day, like sabah {‘morning’), ak§am {‘evening’) etc. (<D-PART>). 

4. The names of the units of time, like saat {‘hour’), gun {‘day’) etc. (<T-UNIT>). 

5. The modifiers of temporal expressions, like gelecek {‘next’), gegen {‘last’) etc. (<M0D>). 

6. Deictic temporal expressions like gimdi {‘now’), diin {‘yesterday’) etc. (<DEIC>). 

7. The determiners like her {‘every’) (<DET>). 

8. The quantifiers like kere {‘times’ as in three times a day) (<QUANT>). 

9. The suffixes that can be attached at the ends of temporal expressions like the case (including genitive and possessive) 
markers, plural markers, and relativizers in Turkish (a single such suffix is denoted as <SUF>). 

10. The apostrophe character (<APST>). 

2.2. Date Expressions 

Before presenting the actual patterns for date expressions (<DATE-EXPR>), we first present patterns for auxiliary 
constructs of <DAY-EXPR>, <M0N-EXPR>, and <YEAR-EXPR> which are in turn used within the <DATE-EXPR> 
patterns. The patterns are presented as regular expressions where ? denotes zero or one, * denotes zero or more, | denotes 
the OR operator and parentheses are for grouping purposes. The patterns may include both the classes of lexical entries, 
described in the previous section, and rarely individual entries themselves, like yd {‘year’), sene {‘year’), ay {‘month’), 
giin {‘day’), and saat {‘hour’). 

Though not denoted in the patterns, there are also constraints, regarding the lexical entries, that should be enforced during 
the utilization of the patterns. For instance, the <NUM> values within the <DAY-EXPR> should be within the range of 
[1..31] while the <NUM> values within the <MON-EXPR> should be within the range of [1..12]. 


<DAY-EXPR> ^ (<NUM><APST> | (<ORD> | <DAY>) giin) <SUF>* 

<MON-EXPR> ^ <MON><APST><SUF>* | (<ORD> | <MON>) ay)<SUF>* 

<YEAR-EXPR> -)■ <NUM> ( (yil | sene) <SUF>* ) ? 

Below provided are some wide-coverage patterns for extracting date expressions in Turkish. 

<DATE-EXPR> (<NUM> . <NUM> . <NUM> | <NUM>/<NUM>/<NUM>) (1) 

<DATE-EXPR> -)■ <NUM>? <MON> <NUM>? <DAY>? (2) 

<DATE-EXPR> -)■ <YEAR-EXPR> <MON-EXPR>? <DAY-EXPR>? (3) 

<DATE-EXPR> <YEAR-EXPR> <NUM> (<MON><SUF>* | <MON> <DAY>?) (4) 

<DATE-EXPR> -)■ <MON-EXPR> <DAY-EXPR>? (5) 

<DATE-EXPR> -)■ <MOD>? (<T-UNIT> | <DAY> | <MON> | <SEAS>) (6) 

<DATE-EXPR> <DEIC> (7) 


Sample date instances conforming to some of these patterns are given in Table [T] In this table and the other tables in the 
current paper, the first column shows the Turkish samples, the second column shows their meanings in English, the third 
column shows the TIMEX3 annotation of the sample, and the fourth column shows the number of the pattern that the 
sample conforms to. For the sample in the second to last row of Table [T] the normalized value is given with respect to a 
reference date in the year 2015. 


3 



Table 1: Sample Date Expressions in Turkish. 


Date Expression 

Meaning 

TIMEX3 Annotation Pattern 

23.03.2015 

23.03.2015 

<TIMEX3 tid="tl" type="DATE" 
value="2015-03-23">23.03.2015 
</TIMEX3> 

(1) 

23 Mart 2015 

March 23, 2015 

<TIMEX3 tid="tl" type="DATE" 
value="2015-03-23">23 Mart 
2015</TIMEX3> 

(2) 

23 Mart 2015 Pazartesi 

March 23, 2015 Monday 

<TIMEX3 tid="tl" type="DATE" 
value="2015-03-23">23 Mart 

2015 Pazartesi</TIMEX3> 

(2) 

2015 ydinin MartTnm 23’ii 

the 23rd of the March of 
the year 2015 

<TIMEX3 tid="tl" type="DATE" 
value="2015-03-23">2015 
yilinin Mart'inin 

23'u</TIMEX3> 

(3) 

2015 yih 23 MartT 

the 23rd of the March of 
the year 2015 

<TIMEX3 tid="tl" type="DATE" 
value="2015-03-23">2015 yili 

23 Mart'i</TIMEX3> 

(4) 

Mart ayinin ikinci giinii 

the second of March 

<TIMEX3 tid="tl" type="DATE" 
value="XXXX-03-02">Mart ayinin 
ikisi</TIMEX3> 

(5) 

ge 5 en sonbahar 

last autumn 

<TIMEX3 tid="tl" type="DATE" 
value="2014-FA">gegen 
sonbahar</TIMEX3> 

(6) 

§imdi 

now 

<TIMEX3 tid="tl" type="DATE" 
value="PRESENT REF">§imdi</TIMEX3> 

(7) 


2.3. Time Expressions 

Below listed are the patterns for the common time expressions in Turkish and samples conforming to these patterns are 
provided in Table |2] As the final pattern denotes, some time patterns make use of date expressions extracted as well and 


can be recursive. 

<TIME-EXPR> ^ <D-PART>? saat? (<NUM>.<NUM> | <NUM>:<NUM>) (8) 

<TIME-EXPR> <D-PART>? saat <NUM> (9) 

<TIME-EXPR> <DAY>? <D-PART> saat<SUF>* (10) 

<TIME-EXPR> <DAY>? <D-PART><SUF>* (11) 

<TIME-EXPR> ^ <DATE-EXPR> <TIME-EXPR> (12) 


Table 2: Sample Time Expressions in Turkish. 


Time Expression 

Meaning 

TIMEX3 Annotation 

Pattern 

11.30 

11.30 

<TIMEX3 tid="tl" type="TIME" 
value="Tll:30">11.30</TIMEX3> 

(8) 

sab ah saat dokuz 

nine o’clock in the morning 

<TIMEX3 tid="tl" type="TIME" 
value="T09:00">sabah saat 
dokuz</TIMEX3> 

(9) 

sabah saatleri 

morning hours 

<TIMEX3 tid="tl" type="TIME" 

value="TMO">sabah 

saatleri</TIMEX3> 

(10) 

Pazartesi sabahi 

Monday morning 

<TIMEX3 tid="tl" type="TIME" 

value="XXXX-WXX-lTMO">Pazartesi 

sabahi</TIMEX3> 

(11) 

2 Mayis saat 14:00 

14:00 o’clock. May 2 

<TIMEX3 tid="tl" type="TIME" 
value="XXXX-05-02T14:00">2 

Mayis saat 14:00</TIMEX3> 

(12) 
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2.4. Set Expressions 

Below provided are common patterns for the extraction of set expressions and sample set expressions conforming to these 
patterns are listed in Table |3] 

<SET-EXPR> -)• <DET> (<T-UNIT> | <DAY> | <MON> | <SEAS>) (13) 

<SET-EXPR> -)• <T-UNIT><SUF> <NUM> <QUANT>? (14) 

<SET-EXPR> ^ <DET>? <NUM>? <T-UNIT><SUF> <NUM> <QUANT>? (15) 


Table 3; Sample Set Expressions in Turkish. 


Set Expression 

Meaning 

TIMEX3 Annotation 

Pattern 

her ay 

every month 

<TIMEX3 tid="tl" type="SET" 
value="PlM" quant="EVERY">her 
aY</TIMEX3> 

(13) 

her Pazartesi 

every Monday 

<TIMEX3 tid="tl" type="SET" 
value="XXXX-WXX-l" 
quant="EVERY">her 
Pazartesi</TIMEX3> 

(13) 

haftada iki kez 

twice a week 

<TIMEX3 tid="tl" type="SET" 
value="PlW" freq="2X">haftada 
iki kez</TIMEX3> 

(14) 

her iki giinde bir 

once every two days 

<TIMEX3 tid="tl" type="SET" 
value="P2D" quant="EVERY">iki 
giinde bir</TIMEX3> 

(15) 


2.5. Duration Expressions 

The two patterns for the extraction of duration expressions in Turkish are given below and three related samples are 
provided in Table |4] 

<DURATION-EXPR> ^ <NUM> <T-UNIT> (16) 

<DURATION-EXPR> ^ <T-UNIT><SUF>* (17) 


Table 4: Sample Duration Expressions in Turkish. 


Duration Expression 

Meaning 

TIMEX3 Annotation 

Pattern 

iki giin 

two days 

<TIMEX3 tid="tl" type="DURATION" 
value="P2D">iki gun</TIMEX3> 

(16) 

sekiz hafta 

eight weeks 

<TIMEX3 tid="tl" type="DURATION" 
value="P8W">sekiz hafta</TIMEX3> 

(16) 

yillar 

years 

<TIMEX3 tid="tl" type="DURATION" 
value="PXY">yillar</TIMEX3> 

(17) 


3. Open Issues 

The open issues on temporal expression extraction from Turkish texts include the following; 


• The development of temporal expression extraction and normalization systems is an important open issue for Turkish. 
A convenient system can be achieved by (i) building a rule-based/learning system from scratch, or by (ii) extend¬ 
ing an already existing and open-source temporal expression extractor, like HeidelTime ( [Strot g en and Gertz, 2010[ ) 
or SUTime ([Chang and Manning, 2013)1, to Turkish, or by (iii) extending an already existing Turkish NER system 
recognizing date and time expressions, like ( Kiigtik and Yazici, 2009| ), to make it a full-fledged temporal expression 
extractor. Deeper examinations of these tools are definitely necessary to assess the feasibility of each option, yet, the 
second and the third options currently seem less labor-intensive compared to the first one. 


• Due to the agglutinative nature of Turkish, the tokens within the temporal expressions can have sequences of suffixes 
attached, as demonstrated in the proposed patterns given in the previous section. So, a convenient morphological 
analyzer should be considered for inclusion into the prospective systems. 
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• In order to train and test the prospective temporal expression extraction and normalization proposals for Turkish, 
conveniently annotated corpora in Turkish are necessary. To the best of our knowledge, no such resource, in other 
words, no Turkish Timebank exists currently. Actually, this lack of annotated corpora is an issue even for the more 
commonly studied problem of NER on Turkish texts. The only study that describes a publicly-available Turkish corpus 
(of tweets) annotated with the MUC-style basic named entity types (person, location, and organization names, money 
and percentage expressions, along with date and time expressions) is presented in (Kii^iik et al., 20141. This annotated 
resource can be used as a starting point to build a Turkish Timebank, though it should be noted that no normalization 
information exists for the annotated date and time expressions in the current form of the resource. 


• After the developments to be carried out within the course of the previous two items above, temporal signals (to be 
annotated with the SIGNAL tag) and events can be included within the scopes of the system proposals to fully comply 
with the TimeML specifications. Thereby, a full-fledged temporal expression and event extraction system can be 
achieved for Turkish. 


4. Conclusion 

Temporal expression extraction is an important information extraction task and the corresponding extraction tools make 
significant contributions to larger natural language processing tasks. In this paper, we present a TimeML-based analysis of 
temporal expressions in Turkish as related studies on Turkish texts are quite rare. We first describe a temporal lexicon and 
then use the classes in the lexicon as the building blocks to devise a total of 17 wide-coverage patterns for the extraction 
of date, time, set, and duration expressions in Turkish. We also provide samples of temporal expressions in Turkish along 
with the related open issues. 
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